Efficient and effective OCR engine training
نویسندگان
چکیده
منابع مشابه
Efficient OCR Training Data Generation with Aletheia*
We present how the ground-truthing tool Aletheia can be used to efficiently create training data for an open-source text recognition engine. The labelling process is sped up considerably through a top-down approach. Text content is thereby entered on region level. The characters are then propagated automatically to glyph objects. In addition, segmentation is simplified by several semi-automated...
متن کاملTowards an Efficient and Effective Search Engine
Building an efficient and effective search engine requires both science and engineering. In this paper, we discuss the ATIRE search engine developed in our research lab, and both the engineering decisions and research questions that have motivated building ATIRE.
متن کاملHairetes: A Search Engine for OCR Documents
In this paper, we report on the architecture and preliminary implementation of our search engine, Hairetes. This engine is based on an extended concept of Retrieval by General Logical Imaging (RbGLI). In this extension, word similarity measures are computed by EMIM and Bayes’ theorem.
متن کاملOCR with No Shape Training
We present a document-specific OCR system and apply it to a corpus of faxed business letters. Unsupervised classification of the segmented character bitmaps on each page, using a “clump” metric, typically yields several hundred clusters with highly skewed populations. Letter identities are assigned to each cluster by maximizing matches with a lexicon of English words. We found that for 2/3 of t...
متن کاملDistributed Classifier Training for Large Scale OCR
OCRopus (www.ocropus.org) is a new open source OCR system targeted at large books scanning and digital library applications, sponsored by Google for use in the Google Book system. Development started in 2007, with a beta release planned for April 2008. It is based on an earlier handwriting recognition system for U.S. Census forms . OCRopus currently contains two character recognizers (experimen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal on Document Analysis and Recognition (IJDAR)
سال: 2019
ISSN: 1433-2833,1433-2825
DOI: 10.1007/s10032-019-00347-8